Outlier Detection Using Nonconvex Penalized Regression

نویسندگان

  • Yiyuan She
  • Art B. Owen
چکیده

This paper studies the outlier detection problem from the point of view of penalized regressions. Our regression model adds one mean shift parameter for each of the n data points. We then apply a regularization favoring a sparse vector of mean shift parameters. The usual L1 penalty yields a convex criterion, but we find that it fails to deliver a robust estimator. The L1 penalty corresponds to soft thresholding. We introduce a thresholding (denoted by Θ) based iterative procedure for outlier detection (Θ-IPOD). A version based on hard thresholding correctly identifies outliers on some hard test problems. We find that Θ-IPOD is much faster than iteratively reweighted least squares for large data because each iteration costs at most O(np) (and sometimes much less) avoiding an O(np2) least squares estimate. We describe the connection between Θ-IPOD andM -estimators. Our proposed method has one tuning parameter with which to both identify outliers and estimate regression coefficients. A data-dependent choice can be made based on BIC. The tuned Θ-IPOD shows outstanding performance in identifying outliers in various situations in comparison to other existing approaches. This methodology extends to highdimensional modeling with p ≫ n, if both the coefficient vector and the outlier pattern are sparse.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Iterative Coordinate Descent Algorithm for High-Dimensional Nonconvex Penalized Quantile Regression

We propose and study a new iterative coordinate descent algorithm (QICD) for solving nonconvex penalized quantile regression in high dimension. By permitting different subsets of covariates to be relevant for modeling the response variable at different quantiles, nonconvex penalized quantile regression provides a flexible approach for modeling high-dimensional data with heterogeneity. Although ...

متن کامل

Re vi ew C op y “ A Penalized Trimmed Squares Method for Deleting Outliers in Robust Regression ”

We consider the problem of identifying multiple outliers in linear regression models. In robust regression the unusual observations should be removed from the sample in order to obtain better fitting for the rest of the observations. Based on the LTS estimate, we propose a penalized trimmed square estimator PTS, where penalty costs for discarding outliers are inserted into the loss function. We...

متن کامل

Penalized unsupervised learning with outliers.

We consider the problem of performing unsupervised learning in the presence of outliers - that is, observations that do not come from the same distribution as the rest of the data. It is known that in this setting, standard approaches for unsupervised learning can yield unsatisfactory results. For instance, in the presence of severe outliers, K-means clustering will often assign each outlier to...

متن کامل

Analysis Methods for Supersaturated Design: Some Comparisons

Supersaturated designs are very cost-effective with respect to the number of runs and as such are highly desirable in many preliminary studies in industrial experimentation. Variable selection plays an important role in analyzing data from the supersaturated designs. Traditional approaches, such as the best subset variable selection and stepwise regression, may not be appropriate in this situat...

متن کامل

Outlier Detection by Boosting Regression Trees

A procedure for detecting outliers in regression problems is proposed. It is based on information provided by boosting regression trees. The key idea is to select the most frequently resampled observation along the boosting iterations and reiterate after removing it. The selection criterion is based on Tchebychev’s inequality applied to the maximum over the boosting iterations of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1006.2592  شماره 

صفحات  -

تاریخ انتشار 2010